Resolving species names rapidly and accurately with taxastand

Joel Nitta1, Wataru Iwasaki1

1: The University of Tokyo Botany 2022
https://joelnitta.github.io/botany_2022_taxastand

Species names are the “glue” that connect datasets

Page (2013)

Synonyms break linkages

In the age of big data, software is needed to resolve taxonomy

Shortcomings of current approaches

  • Many tools only available via an online interface (API)
    • Difficult to reproduce
  • Limited number of reference databases to choose from
    • May not be able to implement taxonomy of choice
  • Existing tools do not recognize the rules of taxonomic nomenclature
    • May not be able to accurately match names

Features of taxastand

  • Run locally in R
  • Allows usage of a custom reference database
  • Supports fuzzy matching
  • Understands taxonomic rules

Available at https://github.com/joelnitta/taxastand

Usage

Installation

In R:

# install remotes first
install.packages("remotes")
remotes::install_github("joelnitta/taxastand")
library(taxastand)

Also, need to either install taxon-tools or Docker

Basic matching: fuzzy matching

res <- ts_match_names(
    query = "Crepidomanes minutus",
    reference = c(
      "Crepidomanes minutum",
      "Hymenophyllum polyanthos"),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutus"
$ reference  <chr> "Crepidomanes minutum"
$ match_type <chr> "auto_fuzzy"

Basic matching: taxonomic rules

res <- ts_match_names(
    query = "Crepidomanes minutum K. Iwats.",
    reference = c(
      "Crepidomanes minutum (Bl.) K. Iwats.",
      "Hymenophyllum polyanthos (Sw.) Sw."),
    simple = TRUE,
    docker = TRUE
    )
glimpse(res)
Rows: 1
Columns: 3
$ query      <chr> "Crepidomanes minutum K. Iwats."
$ reference  <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ match_type <chr> "auto_basio-"

For name resolution, need a reference database

data(filmy_taxonomy)
head(filmy_taxonomy[c("taxonID", "acceptedNameUsageID",
  "taxonomicStatus", "scientificName")])
   taxonID acceptedNameUsageID taxonomicStatus
1 54115096                  NA   accepted name
2 54133783            54115097         synonym
3 54115097                  NA   accepted name
4 54133784            54115098         synonym
5 54115098                  NA   accepted name
6 54133785            54115099         synonym
                             scientificName
1             Cephalomanes atrovirens Presl
2                Trichomanes crassum Copel.
3 Cephalomanes crassum (Copel.) M. G. Price
4           Trichomanes densinervium Copel.
5 Cephalomanes densinervium (Copel.) Copel.
6         Trichomanes infundibulare Alderw.

Where to get taxonomic data?

Name resolution

res <- ts_resolve_names(
  query = "Gonocormus minutum",
  ref_taxonomy = filmy_taxonomy,
  docker = TRUE)
glimpse(res)
Rows: 1
Columns: 6
$ query           <chr> "Gonocormus minutum"
$ resolved_name   <chr> "Crepidomanes minutum (Bl.) K. Iwats."
$ matched_name    <chr> "Gonocormus minutus (Bl.) Bosch"
$ resolved_status <chr> "accepted name"
$ matched_status  <chr> "synonym"
$ match_type      <chr> "auto_fuzzy"

Example: ferns of Japan

https://github.com/joelnitta/ja_ferns_names

How can we make a map of endangered species of the ferns of Japan?

  • GreenList: Conservation status
  • GBIF: Distribution data

GreenList and GBIF do not use the same taxonomy.

Solution: match names of both to pteridocat

  1. Match GBIF to pteridocat
  2. Match GreenList to pteridocat
  3. Merge GreenList and GBIF
  4. Compare to Ebihara and Nitta (2019) (non-GBIF data)

Results

Unmatched names in GBIF data likely artifacts

Of 1,092 species (331,453 occurrences) in GBIF data,
770 names resolved (302,985 occurrences) to names in Green List

Match type n
Full match 516
Difference in punctuation 196
Missing author 22
Taxonomic rule 20
Fuzzy match 16
TOTAL 770

Summary

taxastand allows for reliable, customizable taxonomic resolution

  • Main feature: use of custom taxonomy
    • Advantage: can be adapted to different projects
    • Disadvantage: not simple to prepare/maintain reference db

Please choose the tool that works best for you!
(see Grenié et al. 2022)

Acknowledgements

Ebihara, A., and J. H. Nitta. 2019. An update and reassessment of fern and lycophyte diversity data in the Japanese Archipelago. Journal of Plant Research 132:723–738.
Grenié, M., E. Berti, J. Carvajal‐Quintero, G. M. L. Dädlow, A. Sagouis, and M. Winter. 2022. Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution:2041–210X.13802.
Page, R. D. M. 2013. BioNames: linking taxonomy, texts, and trees. PeerJ 1:e190.